计算机与现代化 ›› 2012, Vol. 1 ›› Issue (11): 171-176.doi: 10.3969/j.issn.1006-2475.2012.11.042

• 网络与通信 • 上一篇    下一篇

基于Solr的分布式全文检索系统的研究与实现

李戴维,李 宁   

  1. 华北计算技术研究所信息技术与应用系统部,北京 100083
  • 收稿日期:2012-07-13 修回日期:1900-01-01 出版日期:2012-11-10 发布日期:2012-11-10

Research and Implementation of Distributed Full-text Retrieval System Based on Solr

LI Dai-wei, LI Ning   

  1. Department of Information Technology and Application System, North China Institute of Computing Technology, Beijing 100083, China
  • Received:2012-07-13 Revised:1900-01-01 Online:2012-11-10 Published:2012-11-10

摘要: 随着当前网络信息资源的急剧膨胀,传统的检索系统已经难以在处理海量数据时提供高效的、可靠的服务。针对该情况,设计并实现一个基于Solr的分布式全文检索系统。系统通过网络爬虫抓取网页信息,将抓取的信息储存为文本文件;然后利用Solr索引处理模块,在多台计算机节点上并行创建索引,有效地提高系统建立索引的速度;系统通过Zookeeper管理集群,将搜索模块设计为分布式,有效地提高检索性能;最后设计了友好的用户界面。目前,系统可以在百万数据量的环境下稳定运行,具有较强的实用价值。

关键词: 全文检索, Solr, 分布式, Zookeeper

Abstract: With the rapid growth of network information resources, traditional retrieval system has been difficult to provide efficient and reliable services to the mass data. In response to this situation, this paper designs a distributed full-text retrieval system based on Solr. The system uses a Web crawler to collect information which is stored as text files. Then the system creates indexes in parallel on multiple computers through Solr index module. It turns out that the design improves the indexing speed effectively. The system improves the retrieval performance by applying Zookeeper management and distributed design in search module. Finally a user-friendly interface is designed. Currently, the system can operate millions of data stably and has a strong practical value.

Key words: full-text search, Solr, distribution, Zookeeper

中图分类号: